feat: enabling dnstt#497
Conversation
- Goroutine storm fixed: DNSTT instances now created lazily inside pool workers (max 10 at a time = 320 goroutines, down from 5,216) - RT caching: probe's http.Transport reused for requests, avoiding a second CONNECT tunnel that causes 502 on the same smux session - probeCancelFn: Close() cancels in-progress probe workers promptly so their DoH goroutines don't starve subsequent probes - waitFor raised to 2 minutes to allow full session establishment (~20-60s) - DNSTT type alias exported so callers don't need a direct import of github.com/getlantern/dnstt
also running go mod tidy
Replace instant-death markFailed with a softer consecutiveFailures counter (5 strikes). When a tunnel fails, clear the cached probe round tripper so the next attempt gets a fresh smux stream instead of reusing a degraded transport. Close permanently dead tunnels instead of leaking them via 'continue'. isSucceeding now requires both that the tunnel has ever succeeded AND is under the failure threshold — the old || logic would recycle tunnels that had never worked. Add an on-demand probe trigger: when NewRoundTripper exhausts all usable tunnels it signals probeCh, which findWorkingDNSTunnels picks up to start a fresh cycle immediately instead of waiting for the timer. Reduce the probe interval from 30 min to 5 min so re-probes happen even without a trigger signal. Also wire domain/resolver into the dnsTunnel struct so RoundTrip logs identify which tunnel handled each request.
DNSTT sessions run over DNS at 135-byte MTU; a single TLS handshake can take 30-60 seconds through the tunnel. The race transport's default 80s GET budget and 3min POST budget are often too tight. Add RequestTimeout() returning 5 minutes so kindling's race transport gives this transport enough time to complete. The upstream kindling change (../kindling) wires this into the race budget. Also replace the kindling dependency with a local path so these changes take effect together.
The config fetcher and kindling HTTP client both use this timeout as an outer bound around the race transport. With DNSTT requesting a 5-minute race budget, the outer timeout of 180s was killing requests before the tunnel could complete — response headers were arriving but body reads were cut off mid-stream. 360s gives DNSTT 5min with 1min of headroom.
Removed the probeRT field from dnsTunnel and all related methods (setProbeRT, getRoundTripper, clearProbeRT). The probe still validates the tunnel but no longer caches its smux session — each real request now creates a fresh round tripper via NewRoundTripper. This fixes the 'io: read/write on closed pipe' cascade: when multiple concurrent requests shared the probe's cached smux session, a single KCP disconnect killed all streams simultaneously. With per-request sessions, each request has isolation — one dying session doesn't take down others.
Sets a 10KB body size cap on the DNSTT transport so the race transport skips it for oversized payloads (e.g. 1.4MB issue report protobufs). DNS tunnels have a 135-byte MTU — 10KB already needs ~75 DNS queries and nearly exhausts the 5-minute RequestTimeout budget. Larger payloads should route through another transport.
Moves DNSTT registration after Smart/AMP/Domainfront so it only races if the other transports fail to connect. Also re-enables all transports now that DNSTT is no longer the sole transport.
Previously, NewRoundTripper returned a connectedRoundtripper without establishing any connection — the actual smux session was created lazily in RoundTrip. This meant DNSTT's NewRoundTripper returned instantly and always won the connection race against Smart/AMP/Domainfront that actually pre-connect. Now NewRoundTripper calls tun.NewRoundTripper during the connect phase so DNSTT pays the real connection cost (KCP + smux handshake) during the race. connectedRoundtripper stores the pre-built RoundTripper and uses it directly.
When a tunnel fails to create a round tripper, the DNSTT instance's goroutines (KCP poll loop, DNS senders) would keep running since the tunnel was consumed from the channel but never closed. This leaked resources and shrunk the available tunnel pool until the next probe cycle. Now closed promptly. Also add blank line separator between immutable fields (domain, resolver) and mutex-guarded fields (lastSucceeded, consecutiveFailures) for clarity.
| Name = "lantern" | ||
|
|
||
| DefaultHTTPTimeout = (60 * time.Second) | ||
| DefaultHTTPTimeout = (360 * time.Second) |
There was a problem hiding this comment.
This is used by more than DNSTT, so we should probably define a separate variable for DNSTT timeout.
| pool := pond.New(poolSize, 10, pond.Context(pondCtx)) | ||
| for _, dnst := range m.options { | ||
| for _, cfg := range m.configs { | ||
| cfg := cfg |
There was a problem hiding this comment.
cfg := cfg not necessary since go v1.22.
| // TLS handshake through 135-byte MTU tunnel (multiple round trips). | ||
| client := &http.Client{Transport: rt, Timeout: 180 * time.Second} | ||
|
|
||
| req, err := http.NewRequestWithContext(pondCtx, http.MethodGet, "http://www.gstatic.com/generate_204", http.NoBody) |
There was a problem hiding this comment.
Did you mean to change to HTTP instead of HTTPS? Since one issue was that tunnels timeout/fail during the TLS handshake (due to transfer speed, packet size, etc.), we should use a request equivalent to what will be sent when actually running. Otherwise, the test is kind of meaningless.
There was a problem hiding this comment.
Yes, I've changed to HTTP because there was too many failures with dns tunnels while using HTTPS for testing and this test basically create a session and keep it open to further requests to just use the same open session
There was a problem hiding this comment.
Oh OK. Is that already documented somewhere? If not, would you add a comment in the function doc explaining that this is used to eagerly establish connections?
|
|
||
| m := &multipleDNSTTTransport{ | ||
| tunChan: make(chan *dnsTunnel, 400), | ||
| tunChan: make(chan *dnsTunnel, 400), |
There was a problem hiding this comment.
400 is kind of high, especially given the recent iOS memory issues. Since tunChan holds successful tunnels, we only need to retain a few, <10. Once we find enough working ones we can stop testing the rest, then only we start testing again from were we left off to find replacements for ones that stop working upto the max.
| tun.markSucceeded() | ||
| if !m.closed.Load() { | ||
| m.tunChan <- tun | ||
| } else { |
There was a problem hiding this comment.
This builds and adds a new DNSTT instance for successful ones but they're only removed when they fail. So eventually tunChan will fill up and no more can be added, resulting in this blocking until one does fail and is removed and leaking if that doesn't happen. We should skip finding new tunnels if we have some working ones already (5 is reasonable) and only replace broken ones. If we want to establish new tunnels when we receive a new config, we should completely replace existing ones, while not exceeding the max open tunnels.
| // NewRoundTripper creates a pre-connected HTTP round tripper for the given | ||
| // address. It blocks until a KCP session and smux stream are established so | ||
| // that the race transport can fairly compare connection latencies. |
There was a problem hiding this comment.
It blocks until a KCP session and smux stream are established so that the race transport can fairly compare connection latencies.
The KCP session and smux streams are actually established lazily — on first request. So this isn't correct.
| rt, err := tun.NewRoundTripper(ctx, addr) | ||
| if err != nil { | ||
| tun.recordFailure() | ||
| tun.Close() | ||
| slog.WarnContext(ctx, "dnstt roundtripper creation failed during connect", | ||
| "domain", tun.domain, "resolver", tun.resolver, "error", err) |
There was a problem hiding this comment.
NewRoundTripper only returns an error if the dnstt instance is already closed. So there's no need to double-close it here or record the failure.
| m.probeCancelMx.Lock() | ||
| m.probeCancelFn = cancel | ||
| m.probeCancelMx.Unlock() |
There was a problem hiding this comment.
tryAllDNSTunnels is triggered by findWorkingDNSTunnels (immediately), NewRoundTripper (via probeCh), and by the timer, which can overlap. Since probeCancelFn is overwritten here each time, it can only cancel the most recently triggered probe. We should use a flag to signal that it's already running and prevent subsequent runs from being triggered.
This pull request introduces significant improvements to the DNSTT (DNS tunnel) transport in the
kindlingsubsystem, enhances reliability and performance under censorship, and updates several dependencies. The DNSTT transport is now enabled by default, features more robust tunnel management and probing, and better integrates with the system's bypass logic. Additionally, HTTP timeouts have been increased, and dependencies have been updated for improved stability and compatibility.DNSTT Transport Improvements:
EnabledTransports, making it available for use in production scenarios.bypass.DialContextdialer, ensuring that DoH/DoT connections do not loop back through the VPN tunnel.RequestTimeoutandMaxLengthmethods to DNSTT transport, tuning request timeouts and maximum body size for the high-latency, low-MTU nature of DNS tunnels.Kindling Client and Transport Integration:
NewKindlinghas been updated, moving the Smart transport configuration before DNSTT and ensuring correct option ordering. [1] [2]General Configuration and Constants:
Dependency Updates:
go.mod, includinggithub.com/getlantern/dnstt,go.opentelemetry.io/otel,golang.org/x/crypto, and others, to newer versions for improved performance and compatibility. [1] [2] [3] [4]parser_test.goto reflect the refactored DNSTT transport internals (configsinstead ofoptions). [1] [2] [3] [4]Code Structure and Typing:
DNSTTin the parser to simplify imports and usage.These changes collectively make the DNSTT transport more robust, production-ready, and easier to maintain, while also keeping the codebase up-to-date with dependency improvements.